Making the codebook

Here, we’re just setting a few options.

knitr::opts_chunk$set(
  warning = TRUE, # show warnings during codebook generation
  message = TRUE, # show messages during codebook generation
  error = TRUE, # do not interrupt codebook generation in case of errors,
                # usually better for debugging
  echo = TRUE  # show R code
)
ggplot2::theme_set(ggplot2::theme_bw())
## Warning: replacing previous import 'vctrs::data_frame' by 'tibble::data_frame'
## when loading 'dplyr'

Now, we’re preparing our data for the codebook.

library(codebook)
webshot::install_phantomjs()
## It seems that the version of `phantomjs` installed is greater than or equal to the requested version.To install the requested version or downgrade to another version, use `force = TRUE`.
library(labelled)
## 
## Attaching package: 'labelled'
## The following object is masked from 'package:codebook':
## 
##     to_factor
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# codebook_data <- codebook::bfi
# to import an SPSS file from the same folder uncomment and edit the line below
# codebook_data <- rio::import("mydata.sav")
# for Stata
# codebook_data <- rio::import("mydata.dta")
# for CSV
codebook_data <- rio::import("peril_reliability_deid.csv") 

codebook_dictionary <- rio::import("peril_reliability_deid_codebook.csv")

var_label(codebook_data) <- codebook_dictionary %>% select(variable, label) %>% dict_to_list()

metadata(codebook_data)$name <- 'Reliability Dataset Codebook'
metadata(codebook_data)$description <- "Reliability data associated with paper 'Dangerous ground: One-year-old infants are sensitive to peril in other agents’ action plans'"
metadata(codebook_data)$creator <- "Shari Liu"
metadata(codebook_data)$datePublished <- "2022-04-12"

# omit the following lines, if your missing values are already properly labelled
# codebook_data <- detect_missing(codebook_data,
#     only_labelled = TRUE, # only labelled values are autodetected as
#                                    # missing
#     negative_values_are_missing = FALSE, # negative values are missing values
#     ninety_nine_problems = TRUE,   # 99/999 are missing values, if they
#                                    # are more than 5 MAD from the median
#     )

# If you are not using formr, the codebook package needs to guess which items
# form a scale. The following line finds item aggregates with names like this:
# scale = scale_1 + scale_2R + scale_3R
# identifying these aggregates allows the codebook function to
# automatically compute reliabilities.
# However, it will not reverse items automatically.
# codebook_data <- detect_scales(codebook_data)

Create codebook

skim_codebook(codebook_data)
Data summary
Name data
Number of rows 408
Number of columns 9
_______________________
Column type frequency:
character 6
numeric 3
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
subj 0 1 1 6 0 102 0
experiment.orig 0 1 3 5 0 6 0
experiment.paper 0 1 4 11 0 6 0
experiment.new 0 1 12 19 0 6 0
trial 0 1 5 5 0 4 0
coder 0 1 5 7 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd min median max hist
orig.look 16 0.96 20.66 16.25 0.95 14.59 60 ▇▆▂▂▂
trialn 0 1.00 2.50 1.12 1.00 2.50 4 ▇▇▁▇▇
secondary.look 12 0.97 20.76 16.39 0.95 14.59 60 ▇▅▂▂▂
codebook(codebook_data)

Metadata

Description

Dataset name: Reliability Dataset Codebook

Reliability data associated with paper ‘Dangerous ground: One-year-old infants are sensitive to peril in other agents’ action plans’

Metadata for search engines

  • Date published: 2022-04-12

  • Creator:

name value
1 Shari Liu
x
subj
experiment.orig
experiment.paper
experiment.new
trial
orig.look
trialn
secondary.look
coder

#Variables

subj

de-identified subject id

Distribution

Distribution of values for subj

Distribution of values for subj

0 missing values.

Summary statistics

name label data_type n_missing complete_rate n_unique empty min max whitespace
subj de-identified subject id character 0 1 102 0 1 6 0

experiment.orig

original name of experiment

Distribution

Distribution of values for experiment.orig

Distribution of values for experiment.orig

0 missing values.

Summary statistics

name label data_type n_missing complete_rate n_unique empty min max whitespace
experiment.orig original name of experiment character 0 1 6 0 3 5 0

experiment.paper

name of expeirment used in paper

Distribution

Distribution of values for experiment.paper

Distribution of values for experiment.paper

0 missing values.

Summary statistics

name label data_type n_missing complete_rate n_unique empty min max whitespace
experiment.paper name of expeirment used in paper character 0 1 6 0 4 11 0

experiment.new

specific name of sample to distinguish between older and younger infants

Distribution

Distribution of values for experiment.new

Distribution of values for experiment.new

0 missing values.

Summary statistics

name label data_type n_missing complete_rate n_unique empty min max whitespace
experiment.new specific name of sample to distinguish between older and younger infants character 0 1 6 0 12 19 0

trial

which test trial (test1-test4)

Distribution

Distribution of values for trial

Distribution of values for trial

0 missing values.

Summary statistics

name label data_type n_missing complete_rate n_unique empty min max whitespace
trial which test trial (test1-test4) character 0 1 4 0 5 5 0

orig.look

looking time generated by original offline coding (used in analysis)

Distribution

Distribution of values for orig.look

Distribution of values for orig.look

16 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
orig.look looking time generated by original offline coding (used in analysis) numeric 16 0.9607843 0.95 15 60 20.66207 16.25162 ▇▆▂▂▂

trialn

index of trial

Distribution

Distribution of values for trialn

Distribution of values for trialn

0 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
trialn index of trial numeric 0 1 1 2.5 4 2.5 1.119407 ▇▇▁▇▇

secondary.look

looking time generated by second coder

Distribution

Distribution of values for secondary.look

Distribution of values for secondary.look

12 missing values.

Summary statistics

name label data_type n_missing complete_rate min median max mean sd hist
secondary.look looking time generated by second coder numeric 12 0.9705882 0.95 15 60 20.75837 16.38608 ▇▅▂▂▂

coder

who did the secondary coding

Distribution

Distribution of values for coder

Distribution of values for coder

0 missing values.

Summary statistics

name label data_type n_missing complete_rate n_unique empty min max whitespace
coder who did the secondary coding character 0 1 3 0 5 7 0

Missingness report

Codebook table

JSON-LD metadata The following JSON-LD can be found by search engines, if you share this codebook publicly on the web.

{
  "name": "Reliability Dataset Codebook",
  "description": "Reliability data associated with paper 'Dangerous ground: One-year-old infants are sensitive to peril in other agents’ action plans'\n\n\n## Table of variables\nThis table contains variable names, labels, and number of missing values.\nSee the complete codebook for more.\n\n|name             |label                                                                    | n_missing|\n|:----------------|:------------------------------------------------------------------------|---------:|\n|subj             |de-identified subject id                                                 |         0|\n|experiment.orig  |original name of experiment                                              |         0|\n|experiment.paper |name of expeirment used in paper                                         |         0|\n|experiment.new   |specific name of sample to distinguish between older and younger infants |         0|\n|trial            |which test trial (test1-test4)                                           |         0|\n|orig.look        |looking time generated by original offline coding (used in analysis)     |        16|\n|trialn           |index of trial                                                           |         0|\n|secondary.look   |looking time generated by second coder                                   |        12|\n|coder            |who did the secondary coding                                             |         0|\n\n### Note\nThis dataset was automatically described using the [codebook R package](https://rubenarslan.github.io/codebook/) (version 0.9.2).",
  "creator": "Shari Liu",
  "datePublished": "2022-04-12",
  "keywords": ["subj", "experiment.orig", "experiment.paper", "experiment.new", "trial", "orig.look", "trialn", "secondary.look", "coder"],
  "@context": "http://schema.org/",
  "@type": "Dataset",
  "variableMeasured": [
    {
      "name": "subj",
      "description": "de-identified subject id",
      "@type": "propertyValue"
    },
    {
      "name": "experiment.orig",
      "description": "original name of experiment",
      "@type": "propertyValue"
    },
    {
      "name": "experiment.paper",
      "description": "name of expeirment used in paper",
      "@type": "propertyValue"
    },
    {
      "name": "experiment.new",
      "description": "specific name of sample to distinguish between older and younger infants",
      "@type": "propertyValue"
    },
    {
      "name": "trial",
      "description": "which test trial (test1-test4)",
      "@type": "propertyValue"
    },
    {
      "name": "orig.look",
      "description": "looking time generated by original offline coding (used in analysis)",
      "@type": "propertyValue"
    },
    {
      "name": "trialn",
      "description": "index of trial",
      "@type": "propertyValue"
    },
    {
      "name": "secondary.look",
      "description": "looking time generated by second coder",
      "@type": "propertyValue"
    },
    {
      "name": "coder",
      "description": "who did the secondary coding",
      "@type": "propertyValue"
    }
  ]
}`